PAROLE - 2013 - Annual activity report

PAROLE

PAROLE - 2013

Project-Team Parole

Members

Overall Objectives

Research Program

Application Domains

Application Domains

Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Partnerships and Cooperations

National initiatives

Equipex ORTOLANG

Project acronym: ORTOLANG (http://www.ortolang.fr )
Project title: Open Resources and TOols for LANGuage
Duration: September 2012 - May 2016 (phase I, signed in January 2013)
Coordinator: ATILF (Nancy)
Other partners: LPL (Aix en Provence), LORIA (Nancy), Modyco (Paris), LLL (Orléans), INIST (Nancy)
Abstract: The aim of ORTOLANG (Open Resources and TOols for LANGuage) is to propose a network infrastructure offering a repository of language data (corpora, lexicons, dictionaries, etc) and tools and their treatment that are readily available and well-documented which will:
- enable a real mutualization of analysis research, of modeling and automatic treatment of our language bringing us up to the best international level;
- facilitate the use and transfer of resources and tools set up within public laboratories towards industrial partners, in particular towards SME which cannot often develop such resources and tools for language treatment due to the costs of their realization;
- promote the French language and local languages of France by sharing knowledge which has been acquired by public laboratories.
Several teams of the LORIA laboratory contribute to this Equipex, mainly with respect to providing tools for speech and language processing, such as text-speech alignment, speech visualization, syntactic parsing and annotation, ...

ANR ARTIS

Project acronym: ARTIS
Project title: Inversion articulatoire de la parole audiovisuelle pour la parole augmentée
Duration: January 2009 - June 2013
Coordinator: Yves Laprie (LORIA)
Other partners: Gipsa-Lab, LTCI, IRIT
Abstract: The main objective of ARTIS is to recover the temporal evolution of the vocal tract shape from the acoustic signal.

This contract started in January 2009 in collaboration with LTCI (Paris), Gipsa-Lab (Grenoble) and IRIT (Toulouse). Its main purpose is the acoustic-to-articulatory inversion of speech signals. Unlike the European project ASPI the approach followed in our group will focus on the use of standard spectra input data, i.e. cepstral vectors. The objective of the project is to develop a demonstrator enabling inversion of speech signals in the domain of second language learning.

This year the work has focused on the development of the inversion from cepstral data as input. We particularly worked on the comparison of cepstral vectors calculated on natural speech and those obtained via the articulatory to acoustic mapping. Bilinear frequency warping was combined with affine adaptation of cepstral coefficients. These two adaptation strategies enable a very good recovery of vocal tract shapes from natural speech. The second topic studied is the access to the codebook. Two pruning strategies, a simple one using the spectral peak corresponding to F2 and a more elaborated one exploiting lax dynamic programming applied on spectral peaks enable a very efficient access to the articulatory codebook used for inversion.

This year, the project focused on the articulatory synthesis in order to generate better sequences of consonant/vowel/consonant by developing time patterns coordinating source and vocal tract dynamics.

ANR ViSAC

Project acronym: VISAC
Project title: Acoustic-Visual Speech Synthesis by Bimodal Unit Concatenation
Duration: January 2009 - June 2013
Coordinator: Slim Ouni
Other partners: Magrit EPI (Inria)
Abstract: The main VISAC objective is to realize the bimodal (audio plus visual) synthesis of speech.

This contract started in January 2009 in collaboration with Magrit Inria team. The purpose of this project is to develop synthesis techniques where speech is considered as a bimodal signal with its acoustic and visual components that are considered simultaneously. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. This final year of the project, we have performed an extensive evaluation of the synthesis system using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels [22] .

ANR ORFEO

Project acronym: ORFEO (http://www.agence-nationale-recherche.fr/en/anr-funded-project/?tx_lwmsuivibilan_pi2[CODE]=ANR-12-CORP-0005 )
Project title: Outils et Ressources pour le Français Ecrit et Oral
Duration: February 2013 - February 2016
Coordinator: Jeanne-Marie DEBAISIEUX (Université Paris 3)
Other partners: ATILF, CLLE-ERSS, ICAR, LIF, LORIA, LATTICE, MoDyCo
Abstract: The main ORFEO objective is the constitution of a Corpus for the Study of Contemporary French.

In this project, we have provided an automatic alignment at the word and phoneme levels for audio files from the corpus TCOF (Traitement de Corpus Oraux en Français). This corpus contains mainly spontaneous speech, recorded under various conditions with a large SNR range and a lot of overlapping speech. We tested different acoustic models and different adaptation methods for the forced alignment.

ANR-DFG IFCASL

Project acronym: IFCASL
Project title: Individualized feedback in computer-assisted spoken language learning
Duration: March 2013 - February 2016
Coordinator: Jürgen Trouvain (Saarland University)
Other partners: Saarland University (COLI department)
Abstract: The main objective of IFCASL is to investigate learning of oral French by German speakers, and oral German by French speakers at the phonetic level.

The work has mainly focused on the design of a corpus of French sentences and text that will be recorded by German speakers learning French, recoding a corpus of German sentences read by French speakers, and tools for annotating French and German corpora. Beforehand, two preliminary small corpora have been designed and recorded in order to bring to the fore the most interesting phonetic issues to be investigated in the project. In addition this preliminary work was used to test the recording devices so as to guarantee the same quality of recording in Saarbrücken and in Nancy, and to design and develop recording software.

In this project, we also provided an automatic alignment procedure at the word and phoneme levels for 4 corpora: French sentences uttered by French speakers, French sentences uttered by German speakers, German sentences uttered by French speakers, German sentences uttered by German speakers.

ANR ContNomina

Project acronym: ContNomina
Project title: Exploitation of context for proper names recognition in the diachronic audio documents
Duration: February 2013 - July 2016
Coordinator: Irina Illina (Loria)
Other partners: LIA, Synalp
Abstract: the project ContNomina focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the project will address:
- the statistical modeling of contexts and of relationships between contexts and proper names;
- the contextualization of the recognition module through the dynamic adjustment of the lexicon and of the language model in order to make them more accurate and certainly more relevant in terms of lexical coverage, particularly with respect to proper names;
- the detection of proper names, on the one hand, in text documents for building lists of proper names, and on the other hand, in the output of the recognition system to identify spoken proper names in the audio / video data.

FUI RAPSODIE

Project acronym: RAPSODIE (http://erocca.com/rapsodie )
Project title: Automatic Speech Recognition for Hard of Hearing or Handicapped People
Duration: March 2012 - February 2016 (signed in December 2012)
Coordinator: eRocca (Mieussy, Haute-Savoie)
Other partners: CEA (Grenoble), Inria (Nancy), CASTORAMA (France)
Abstract: The goal of the project is to realize a portable device that will help a hard of hearing person to communicate with other people. To achieve this goal the portable device will embed a speech recognition system, adapted to this task. Another application of the device will be environment vocal control for handicapped persons.

In this project, the parole team is involved for optimizing the speech recognition models for the envisaged task, and contributes also to finding the best way of presenting the speech recognition results in order to maximize the communication efficiency between the hard of hearing person and the speaking person.

ADT FASST

The Action de Développement Technologique Inria (ADT) FASST (2012–2014) is conducted by PAROLE in collaboration with the teams PANAMA and TEXMEX of Inria Rennes. It aims to reimplemented into efficient C++ code the Flexible Audio Source Separation Toolbox (FASST) originally developed in Matlab by A. Ozerov, E. Vincent and F. Bimbot in the METISS team of Inria Rennes. This will enable the application of FASST on larger data sets, and its use by a larger audience. The new C++ version will be released early 2014. The second year of the project will be devoted to the integration of FASST with speech recognition software in order to perform noise robust speech recognition.

ADT VisArtico

The technological Development Action (ADT) Inria Visartico just started this November (11/2013 - 10/2015). The purpose of this project is to develop and improve VisArtico, an articulatory vizualisation software. In addition to improve the basic functionalities, several articulatory analysis and processing will be integrated. We will also work on the integration of multimodal data.

Previous |

Home | Next next